New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Vizier output transforms #2643

Draft

CompRhys wants to merge 24 commits into pytorch:main from Radical-AI:vizier-output-transforms

Contributor

CompRhys commented Dec 3, 2024 •

edited

Loading

Motivation

I wanted to experiment with the vizier output transforms in botorch - in particular the infeasible transform. Potentially misplaced effort as it appears that within the meta ecosystem Ax actually handles this sort of stuff and so these may not actually end up being able to used very much but worth upstreaming.

takes transforms from: https://arxiv.org/abs/2408.11527

Test Plan

Added tests for all functionality, not sure I am at 100% coverage

Related PRs

this branch takes off from #2636 so that would need to be merged first.

CompRhys added 20 commits

November 19, 2024 16:22


          wip: topk ic generation

8047aa1


          tests: add tests

f5a8d64


          Merge remote-tracking branch 'upstream/main' into topk-icgen

96f9bef


          fix: micro-optimization suggestion from review

a022462


          fix: don't use unnormalize due to unexpected behaviour with constant …

e75239d

…bounds


          doc: initialize_q_batch_topk -> initialize_q_batch_topn

8e27422


          tests: achive full coverage

662caf1


          clean: remote debug snippet

75eea37


          Merge remote-tracking branch 'upstream/main' into topk-icgen

5e0fe59


          fea: add InfeasibleTranforms from vizier

ec4d7f8


          fea: add the logwarp transform from vizier

351c3f8


          wip half rank

a656685


          fea: use unnormalize in more places but add flag to turn off the cons…

88a2e5d

…tant bound adjustment


          doc: add docstring for the new update_constant_bounds argument

e0202e2


          Merge branch 'topk-icgen' into vizier-output-transforms

7f439c5


          Merge remote-tracking branch 'upstream/main' into vizier-output-trans…

44225d8

…forms


          wip: untransform still doesn't work

3a87cc6


          Merge remote-tracking branch 'upstream/main' into vizier-output-trans…

2cc3efa

…forms


          fea: add half rank transform

926d9e2


          test: add tests for half-rank

301fa5c

facebook-github-bot added the CLA Signed label


          test: reduce the number of tests run whilst ensuring coverage

21c14b2

saitcakmak requested changes

View reviewed changes

Contributor

saitcakmak left a comment

Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.

None of these transforms implement the untransform_posterior method, which is necessary to make sure the transforms work as an outcome_transform in BoTorch models. Without it, Model.posterior will error out. Would it be possible to implement untransform_posterior, likely returning a TransformedPosterior object.

None of the transforms seem to support Yvar either. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?

test/models/transforms/test_outcome.py Outdated

Comment on lines 896 to 899

+                      assert transform._batch_shape == batch_shape
+                      assert not transform._is_trained
+                      assert transform._shift is None
+                      assert torch.isnan(transform.warped_bad_value)

Contributor

saitcakmak Dec 3, 2024

We tend to use self.assertEqual(...), self.assertTrue/False(...) etc from TestCase rather than using bare assert in tests

botorch/models/transforms/outcome.py Show resolved Hide resolved

botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved

botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved

botorch/models/transforms/outcome.py Outdated

Comment on lines 900 to 903

+                      if Yvar is not None:
+                          raise NotImplementedError(
+                              "InfeasibleTransform does not support transforming observation noise"
+                          )

Contributor

saitcakmak Dec 3, 2024

It'd be great to support Yvar as well. Could just set it to the mean of all Yvars? That would only minimally affect the model predictions and let the code run.

botorch/models/transforms/outcome.py Outdated Show resolved Hide resolved

botorch/models/transforms/outcome.py

+                                  y = Y_transformed[*batch_idx, :, dim]
+                                  # Get finite values and their ranks for each batch
+                                  is_finite_mask = ~torch.isnan(y)

Contributor

saitcakmak Dec 3, 2024

BoTorch models do not support NaN inputs. If we didn't need to deal with NaN and somehow avoided unique, could we avoid the double for loop here?

Contributor Author

CompRhys Dec 3, 2024

I think this is the salient point, for me which indicates that I should probably have done this in Ax. Would Ax error out if a metric returned NaN before applying transforms or only after?

Contributor

saitcakmak Dec 3, 2024

I believe the error would come after, though I would have to double check. IIRC, the NaN will propagate down to BoTorch and raise an error there.

I am planning a major refactor of data handling within Ax ModelBridge layer & transforms. The goal is to reduce looping to make it more efficient & scalable. Though if you want to add these in Ax now, it doesn't have to be blocked on the refactor.

botorch/models/transforms/outcome.py Outdated

Comment on lines 1176 to 1177

		for i, val in enumerate(unique_y):
		ranks[y == val] = i + 1

Contributor

saitcakmak Dec 3, 2024

If we didn't care about uniqueness, we could get the rank / order using torch.argsort. Imo, the duplicate observations should be pretty rare in practice and probably are safe to ignore. They're probably mostly introduced by the infeasibility warper.

Contributor

saitcakmak Dec 3, 2024

I guess we could also find the median using the middle index if we simply sorted Y from the start.

Contributor Author

CompRhys Dec 3, 2024

I think it's worth having this generality incase other packages like BoFire want to make use of these methods, I tried to stick as close to the reference implementation as possible but expanding to handle multi-dimension and batch_size

botorch/models/transforms/outcome.py Show resolved Hide resolved

botorch/models/transforms/outcome.py

+                              # Process values below median
+                              below_median = y < self._original_label_medians[*batch_idx, dim]
+                              if below_median.any():

Contributor

saitcakmak Dec 3, 2024

This is quite a complicated process to untransform. I find it quite difficult to review since I don't know what the algorithm is supposed to do.

Contributor Author

CompRhys Dec 3, 2024

The HalfRankTransform splits the data at the median and then warps the bottom half onto a gaussian whose std matches the std of the top half of the distribution. On inverting there are 3 cases, a) we can just look it up as it's very close to a known value from the fit, b) we take a in support linear interpolation, c) outside of the support we take a linear extrapolation based on the entire data range.

Contributor Author

CompRhys commented Dec 3, 2024

Thanks for adding these. I've been curious to try out the Vizier transforms. These transforms try to ensure we model the good regions better, which seems better than trying to model everything equally well, when the goal is to optimize a function. Have you tried these on any benchmarks yet? It'd be good to see how they work e2e.

I implemented these here before looking into what I would need to do to connect them to my benchmark testcase that I have been prototyping in Ax. I assumed that the best case would be to implement it lower in the stack and then bubble up but seemingly I should have implemented at the level of Ax (especially as the harder part here was handling the batch_size) hence the comment about potentially misplaced effort. It may be that these shouldn't be merged here if the details can't be hashed out.

None of these transforms implement the untransform_posterior method, which is necessary to make sure the transforms work as an outcome_transform in BoTorch models. Without it, Model.posterior will error out. Would it be possible to implement untransform_posterior, likely returning a TransformedPosterior object.

This should be able to be done based on the logic in Standardize and just using the sample transform. In general the bit here that makes things annoying is the batch_size.

None of the transforms seem to support Yvar either. We use BoTorch through Ax and try to keep the same methods regardless of whether the noise is observed. What does Vizier do with observation noise? Is it simply ignored, or is this just a limitation of current implementation?

I think the vizier approach is similar to performing the transformations at the layer of Ax. Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

CompRhys marked this pull request as draft

December 3, 2024 20:15

Contributor

saitcakmak commented Dec 3, 2024

Ax vs BoTorch

This is something I've also been thinking about quite a bit lately, as part of reworking what Ax uses by default (though this has been rather focused on input transforms so far). Outcome transforms may be slightly simpler in Ax. You technically only need to be able to transform, though we also want some rough untransform method for reporting & cross validation purposes. In BoTorch, you need the transform to be differentiable, since the untransformed posterior is used to compute the acquisition value which needs to be differentiated through during optimization. Ax transforms are applied once as a pre-processing step but BoTorch transforms are applied at each posterior call. There is also the aspect of learnable / tunable transforms in BoTorch but that doesn't apply to outcome transforms.

If I were to implement Vizier transforms, I'd probably try them in Ax, since there are fewer restrictions.

In general the bit here that makes things annoying is the batch_size.

Do you often use batched models? If not, you don't need to support batch_size arg and different treatment for each batch. You could use the same parameters for all batches.

I think the vizier approach is similar to performing the transformations at the layer of Ax.

Agreed. It is more of a pre-processing step.

Their equivalent to Metric only deals with a single float observation. As such and from the docs I believe it's ignored. I don't think that we can implement Yvar for either the LogWarping or the HalfRank transform

Cool, this is good to know.

CompRhys added 3 commits

December 3, 2024 17:32


          fix: fix some review comments

c63c571


          fea: add forward when not in train

cbee6d1


          fix: force=True to enforce device safety

b8311ae

Contributor Author

CompRhys commented Dec 5, 2024

Will move forward with a PR at Ax as that seems like the faster way to make progress on my usecase

Contributor

saitcakmak commented Dec 5, 2024

Sounds good. StandardizeY or PowerTransformY should be good examples to follow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels